Skip to content

Udovenko_git_pain #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Udovenko_git_pain #2

wants to merge 4 commits into from

Conversation

UdovenkoVolodymyr
Copy link

No description provided.

def my_method(self):
print("my_method called!")
@staticmethod
def write(data, output_name, indent=2):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. out_filename could be a better parameter name than output_name.
  2. It's a good idea to provide a default value for the indent param.

print("my_method called!")
@staticmethod
def write(data, output_name, indent=2):
import json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid imports that are near to the place where they are used. Contrary to the local variables (that should be defined closer to the exact place where they are used), all imports should be at the top of your module.

import json
with open(output_name, 'w') as out_file:
parsed = json.loads(json.dumps(data))
json.dump(parsed, out_file, indent=indent)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just json.dump(data, out_file, indent=indent)?


def __clr_tags_to_metadata(self, row):
# task 1 and 2 for the tags array || and task 3 for the text
self.sign_result = re.findall(self.sign_patt, row)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Remember, we're discussed that we shouldn't mutate the internal state in methods. So, avoid self.something = assignments. If you need this value later, in another method - this is a sequential dependency. It's a bad thing, don't use the internal state to communicate between methods. Use methods params for that.
  2. The self.sign_result is not used in this method anymore, so, there's no practical need to calculate its value here, in this method. This line of code should be moved to the __remove_dollar_sign_words method.

self.name = input_file_name

# list of patterns for regular expressions
self.tags_patt = re.compile(r",\[.*\]")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a good idea to pre-create all the data you need in your methods right here, in your __init__().

.replace("'", "'").replace(""", '"').replace('’', '\'')

for sign in self.sign_result:
self.clr_row = self.clr_row.replace(sign, '')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, you use the internal state to store the value (self.clr_row) that is required for the subsequent method calls. Just make it a local variable and return it after all required mutations.

for row in data_file:
row_dict = dict()

self.__clr_tags_to_metadata(row=row)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This piece of code should have been implemented in the following way:

metadata, tags_result = self.__clr_tags_to_metadata(row)  # don't be afraid of multiple return values, sometimes it's a good thing
clr_row = self.__remove_dollar_sign_words(row, tags_result)
body_tags = self.__place_tags_wrd_to_body_tags(clr_row)
orphan_tokens = self.__tokenize_add_orphan_tokens(clr_row)

row_dict["body"] = body # you could just get the `body` from `row` while reading the parsed file
row_dict["body_tags"] = body_tags
row_dict["metadata"] = metadata
row_dict["orphan_tokens"] = orphan_tokens

Put this way, your program is just a pipe that transforms the data, each piece of this pipe is not connected to any other pieces with anything except its params, return value(s) and (seldom) reading the internal state.


def __place_tags_wrd_to_body_tags(self, clr_row):
# task 2 for the text
self.body_tags = re.findall(self.body_patt, clr_row)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing into the internal state - this should be avoided. Make it the return value of the method.


for token in tokens_clr:
if lesk(tokens_clr, token) is None:
self.orphan_tokens.append(token)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing into the internal state - this should be avoided. Make it the return value of the method.

@@ -0,0 +1,101 @@
#!/usr/bin/env python
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess, this file is obsolete and can be deleted.

@vittorius vittorius mentioned this pull request Dec 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants